Filtered metabolomics data using a two-step process: blank subtraction followed by 80% cumulative signal threshold.
| Category | Count | Description |
|---|---|---|
| Sample Only | Peaks only in samples, not in blanks (auto-kept) | |
| Passed Validation | Passed both fold-change (≥3x) AND statistical test (p<0.05, FDR-corrected) | |
| Contamination | Failed fold-change or statistical test (removed) | |
| Blank Only | Peaks only in blanks (not in samples) | |
| No Signal | No detectable signal in any sample or blank |
| Criterion | Passed | Failed | Description |
|---|---|---|---|
| Fold-Change ≥3x | Sample mean ≥ 3× blank mean (pmp/Bioconductor) | ||
| Welch's t-test | p < 0.05 (FDR-corrected, one-sided) | ||
| Insufficient Data | Not enough replicates for t-test (used fold-change only) | ||
Root samples have more concentrated signal - fewer peaks make up 80% of the total.
Each row shows one tissue sample and how many peaks were needed to account for 80% of its total signal (after blank filtering). Samples with fewer peaks needed have more concentrated signal.
| # | ID | Peaks Needed | Smallest Peak Kept |
|---|
| # | ID | Peaks Needed | Smallest Peak Kept |
|---|
Example: If a sample needed 500 peaks to reach 80% of its signal, those 500 peaks are the most abundant compounds. The smallest peak kept might contribute 0.02% - anything contributing less was filtered as noise.
Comparison of metabolite richness (peak counts) and Shannon diversity index across treatments.
Number of detected metabolites per sample (mean ± SE) out of total peaks
H = -Σ(p × ln(p)) where p = relative abundance (Vinaixa et al. 2012)
Data source: df_blank_filtered (after 3x blank subtraction)
For each sample column (e.g., "BL - Drought"):
richness = COUNT of rows WHERE value > 0.0
For each treatment:
mean = SUM(sample_richness_values) / n_samples
SE = STDEV(sample_richness_values) / SQRT(n_samples)
Data source: df_blank_filtered (after 3x blank subtraction)
For each sample column:
1. Get all peak abundances where value > 0.0
2. total = SUM(all abundances)
3. For each peak: p = abundance / total
4. H = -SUM(p × ln(p))
For each treatment:
mean = SUM(H_values) / n_samples
SE = STDEV(H_values) / SQRT(n_samples)
Why blank-filtered, not 80%-filtered? The 80% cumulative filter removes low-abundance peaks to focus analysis. But richness and diversity metrics should capture the FULL metabolome complexity, so we use the larger blank-filtered dataset.
Toggle treatments to compare. Shows peaks that differ between selected treatments.
Shows how peaks are distributed across treatment groups (Drought, Ambient, Watered). "Unique" means the peak is ONLY found in that treatment, not in the others.
How peaks are shared between leaf and root tissues ( unique peaks total).
| Region | Peaks | Meaning |
|---|---|---|
| Leaf Only | Detected in leaf but not root | |
| Root Only | Detected in root but not leaf | |
| Both Tissues | Detected in both leaf and root |
We use a two-step filtering approach to ensure data quality: first removing contamination, then keeping only significant peaks.
Blanks are samples run through the instrument with no plant material - they capture background contamination from solvents, plastics, and the instrument itself.
For each peak found in BOTH samples and blanks: 1. Calculate fold-change: sample_mean / blank_mean 2. Perform Welch's t-test (one-sided: sample > blank) 3. Apply Benjamini-Hochberg FDR correction for multiple testing 4. KEEP only if BOTH criteria pass: - Fold-change ≥ 3x (biological significance, per pmp/Bioconductor) - FDR-adjusted p-value < 0.05 (statistical significance) Peaks ONLY in samples (not in blanks) → auto-KEEP
Why dual criteria? The 3x fold-change threshold (pmp/Bioconductor standard) ensures biological relevance (a peak must be meaningfully higher in samples). The statistical test ensures the difference isn't due to random variation. FDR correction accounts for testing thousands of peaks simultaneously.
After removing contamination, we filter to keep only the most abundant peaks.
1. Take all peaks and their area values for one sample 2. Sort peaks from LARGEST to SMALLEST 3. Add up the areas as you go down the list 4. Stop when you've added up 80% of the total 5. Everything above that line is kept
Important: A compound is kept if it makes the cut in ANY sample. This ensures we don't lose peaks that are important in specific tissues.
| Step | Purpose | What it removes |
|---|---|---|
| Blank Subtraction | Remove contamination | Plasticizers, solvent impurities, instrument background |
| 80% Threshold | Remove noise | Low-abundance peaks that contribute little to the biological profile |
After filtering, we assign molecular formulas to peaks using the MFAssignR R package, which calculates which chemical formulas could produce each measured mass.
For each peak's m/z value: 1. Calculate neutral mass: m/z - 1.007276 (remove proton from [M+H]+) 2. Find all formulas (combinations of C, H, O, N, S, P) that match within 3 ppm 3. Apply chemical rules to filter invalid formulas: - H/C ratio between 0.2 and 3.0 - O/C ratio between 0 and 1.2 - Nitrogen rule (even/odd mass) - Valid double bond equivalents (DBE) 4. Use isotope patterns (13C, 34S) to confirm assignments 5. Select best-matching formula
| Parameter | Value | Meaning |
|---|---|---|
| Ion Mode | Positive [M+H]+ | Compounds detected as protonated molecules |
| Mass Error | 3 ppm | Maximum allowed difference between measured and theoretical mass |
| Mass Range | 100-1000 Da | Only assign formulas to peaks in this range |
| Elements | C, H, O, N≤4, S≤2, P≤2 | Allowed elements and maximum counts |
PPM (parts per million) measures how close the measured mass is to the theoretical formula mass:
ppm = (measured - theoretical) / theoretical × 1,000,000
Example: m/z 427.3778 vs C26H50O4+H theoretical 427.3782
ppm = (427.3778 - 427.3782) / 427.3782 × 1,000,000 = -0.9 ppm
Lower ppm = higher confidence. Our assignments average ppm, which is excellent.
| Class | Elements | Typical Compounds |
|---|---|---|
| CHO | C, H, O only | Sugars, fatty acids, terpenes |
| CHNO | + Nitrogen | Amino acids, alkaloids |
| CHNOS | + Sulfur | Sulfur-containing amino acids |
| CHNOP | + Phosphorus | Phospholipids, nucleotides |
Two-step filtered data: blank subtraction (3x threshold, pmp/Bioconductor) followed by 80% cumulative signal threshold.
Each compound is identified by a code like 3.90_564.1489n. This encodes two measurements:
| Part | Example | Meaning |
|---|---|---|
| First number | 3.90 | Retention time (minutes) - how long it took to pass through the column |
| Second number | 564.1489 | Mass (m/z) - the molecular weight detected |
| Suffix | n or m/z | Just notation style |